177 research outputs found

    Cross-Validating and Bagging Partitioning Algorithms with Variable Importance

    Get PDF
    We present a cross-validated bagging scheme in the context of partitioning algorithms. To explore the benefits of the various bagging scheme, we compare via simulations the predictive ability of single Classification and Regression (CART) Tree with several previously suggested bagging schemes and with our proposed approach. Additionally, a variable importance measure is explained and illustrated

    Comparative Genomic Hybridization Array Analysis

    Get PDF
    At the present time, there is increasing evidence that cancer may be regulated by the number of copies of genes in tumor cells. Through microarray technology it is now possible to measure the number of copies of thousands of genes and gene segments in samples of chromosomal DNA. Microarray comparative genomic hybridization (array CGH) provides the opportunity to both measure DNA sequence copy number gains and losses and map these aberrations to the genomic sequence. Gains can signify the over-expression of oncogenes, genes which stimulate cell growth and have become hyperactive, while losses can signify under-expression of tumor suppressor genes, genes whose activity stops the formation of tumors. In order to better understand the progression of cancer and the differences between cancer and non-cancer tissue it is of great importance to fully understand what is happening at the chromosomal level. In the hopes of finding a genetic signature for subtypes of cancer, it is our intention to explore statistical approaches to array CGH data. The Waldman Lab at UCSF-CCC graciously allowed us to access data from their renal cancer study. This project was designed to determine whether microarray information on copy number of genes could be used to discriminate among four subtypes of renal cancer

    Characterization of Metabolic, Diffusion, and Perfusion Properties in GBM: Contrast-Enhancing versus Non-Enhancing Tumor.

    Get PDF
    BackgroundAlthough the contrast-enhancing (CE) lesion on T1-weighted MR images is widely used as a surrogate for glioblastoma (GBM), there are also non-enhancing regions of infiltrative tumor within the T2-weighted lesion, which elude radiologic detection. Because non-enhancing GBM (Enh-) challenges clinical patient management as latent disease, this study sought to characterize ex vivo metabolic profiles from Enh- and CE GBM (Enh+) samples, alongside histological and in vivo MR parameters, to assist in defining criteria for estimating total tumor burden.MethodsFifty-six patients with newly diagnosed GBM received a multi-parametric pre-surgical MR examination. Targets for obtaining image-guided tissue samples were defined based on in vivo parameters that were suspicious for tumor. The actual location from where tissue samples were obtained was recorded, and half of each sample was analyzed for histopathology while the other half was scanned using HR-MAS spectroscopy.ResultsThe Enh+ and Enh- tumor samples demonstrated comparable mitotic activity, but also significant heterogeneity in microvascular morphology. Ex vivo spectroscopic parameters indicated similar levels of total choline and N-acetylaspartate between these contrast-based radiographic subtypes of GBM, and characteristic differences in the levels of myo-inositol, creatine/phosphocreatine, and phosphoethanolamine. Analysis of in vivo parameters at the sample locations were consistent with histological and ex vivo metabolic data.ConclusionsThe similarity between ex vivo levels of choline and NAA, and between in vivo levels of choline, NAA and nADC in Enh+ and Enh- tumor, indicate that these parameters can be used in defining non-invasive metrics of total tumor burden for patients with GBM

    Survival Point Estimate Prediction in Matched and Non-Matched Case-Control Subsample Designed Studies

    Get PDF
    Providing information about the risk of disease and clinical factors that may increase or decrease a patient\u27s risk of disease is standard medical practice. Although case-control studies can provide evidence of strong associations between diseases and risk factors, clinicians need to be able to communicate to patients the age-specific risks of disease over a defined time interval for a set of risk factors. An estimate of absolute risk cannot be determined from case-control studies because cases are generally chosen from a population whose size is not known (necessary for calculation of absolute risk) and where duration of follow-up is not known (necessary for calculation of incidence). This problem can sometimes be overcome by using a nested case-control design. We have collected data on a National Cancer Institute funded population-based cohort study. This study contains a matched set of cases and controls within the cohort. This design is more cost-efficient than a full cohort study since expensive predictor variables (genomic measures, sex hormone levels, mammographic breast density) are measured on all of the cases, but on only a sample of the cohort who did not develop the outcome of interest (the controls). In addition, this design avoids the potential biases of conventional case-control studies that draw cases and controls from different populations. Importantly, the presence or absence of the outcome of interest has been established for the entire cohort within the same time period. The specifics of the sampling in our study do not adhere to the assumptions for absolute risk estimation methods previously developed in the literature. Here we introduce a novel method which provides locally efficient estimators to predict the absolute risk of a cohort from measures only taken on the matched case-control participants. The proposed method is evaluated using simulation studies and survival data from women with ductal carcinoma in situ, a non-invasive form of breast cancer. A generalization of the proposed method is related to other similar sampling designs such as nested case-control, case-cohort, and two-stage case-control

    Metabolic Profiling of IDH Mutation and Malignant Progression in Infiltrating Glioma.

    Get PDF
    Infiltrating low grade gliomas (LGGs) are heterogeneous in their behavior and the strategies used for clinical management are highly variable. A key factor in clinical decision-making is that patients with mutations in the isocitrate dehydrogenase 1 and 2 (IDH1/2) oncogenes are more likely to have a favorable outcome and be sensitive to treatment. Because of their relatively long overall median survival, more aggressive treatments are typically reserved for patients that have undergone malignant progression (MP) to an anaplastic glioma or secondary glioblastoma (GBM). In the current study, ex vivo metabolic profiles of image-guided tissue samples obtained from patients with newly diagnosed and recurrent LGG were investigated using proton high-resolution magic angle spinning spectroscopy (1H HR-MAS). Distinct spectral profiles were observed for lesions with IDH-mutated genotypes, between astrocytoma and oligodendroglioma histologies, as well as for tumors that had undergone MP. Levels of 2-hydroxyglutarate (2HG) were correlated with increased mitotic activity, axonal disruption, vascular neoplasia, and with several brain metabolites including the choline species, glutamate, glutathione, and GABA. The information obtained in this study may be used to develop strategies for in vivo characterization of infiltrative glioma, in order to improve disease stratification and to assist in monitoring response to therapy

    Loss-Based Estimation with Cross-Validation: Applications to Microarray Data Analysis and Motif Finding

    Get PDF
    Current statistical inference problems in genomic data analysis involve parameter estimation for high-dimensional multivariate distributions, with typically unknown and intricate correlation patterns among variables. Addressing these inference questions satisfactorily requires: (i) an intensive and thorough search of the parameter space to generate good candidate estimators, (ii) an approach for selecting an optimal estimator among these candidates, and (iii) a method for reliably assessing the performance of the resulting estimator. We propose a unified loss-based methodology for estimator construction, selection, and performance assessment with cross-validation. In this approach, the parameter of interest is defined as the risk minimizer for a suitable loss function and candidate estimators are generated using this (or possibly another) loss function. Cross-validation is applied to select an optimal estimator among the candidates and to assess the overall performance of the resulting estimator. This general estimation framework encompasses a number of problems which have traditionally been treated separately in the statistical literature, including multivariate outcome prediction and density estimation based on either uncensored or censored data. This article provides an overview of the methodology and describes its application to two problems in genomic data analysis: the prediction of biological and clinical outcomes (possibly censored) using microarray gene expression measures and the identification of regulatory motifs (i.e., transcription factor binding sites) in DNA sequences

    Optimal tumor sampling for immunostaining of biomarkers in breast carcinoma

    Get PDF
    IntroductionBiomarkers, such as Estrogen Receptor, are used to determine therapy and prognosis in breast carcinoma. Immunostaining assays of biomarker expression have a high rate of inaccuracy; for example, estimates are as high as 20% for Estrogen Receptor. Biomarkers have been shown to be heterogeneously expressed in breast tumors and this heterogeneity may contribute to the inaccuracy of immunostaining assays. Currently, no evidence-based standards exist for the amount of tumor that must be sampled in order to correct for biomarker heterogeneity. The aim of this study was to determine the optimal number of 20X fields that are necessary to estimate a representative measurement of expression in a whole tissue section for selected biomarkers: ER, HER-2, AKT, ERK, S6K1, GAPDH, Cytokeratin, and MAP-Tau.MethodsTwo collections of whole tissue sections of breast carcinoma were immunostained for biomarkers. Expression was quantified using the Automated Quantitative Analysis (AQUA) method of quantitative immunofluorescence. Simulated sampling of various numbers of fields (ranging from one to thirty five) was performed for each marker. The optimal number was selected for each marker via resampling techniques and minimization of prediction error over an independent test set.ResultsThe optimal number of 20X fields varied by biomarker, ranging between three to fourteen fields. More heterogeneous markers, such as MAP-Tau protein, required a larger sample of 20X fields to produce representative measurement.ConclusionsThe optimal number of 20X fields that must be sampled to produce a representative measurement of biomarker expression varies by marker with more heterogeneous markers requiring a larger number. The clinical implication of these findings is that breast biopsies consisting of a small number of fields may be inadequate to represent whole tumor biomarker expression for many markers. Additionally, for biomarkers newly introduced into clinical use, especially if therapeutic response is dictated by level of expression, the optimal size of tissue sample must be determined on a marker-by-marker basis
    corecore